PGS Catalog Calculator installation tutorial
Who is this tutorial for?
An understanding of what polygenic scores (PGS) are, why they’re useful, and the PGS Catalog would be helpful before starting this tutorial.
The PGS Catalog Calculator (pgsc_calc) doesn’t have a graphical interface, so you’ll need to be able to open a terminal and run some commands to complete this tutorial.
What will I achieve?
By the end of this tutorial you’ll have:
- Installed
pgsc_calcon your computer - Calculated some test PGS
What resources do I need?
You’ll need a computer with:
- A modern version of Linux 🐧 or macOS 🍏
- A good amount of RAM (16GB or more preferred)
- Permission to install software on your machine
Getting started
Install Docker
If you already have docker installed on your computer, you can skip this section.
If you’re using Docker desktop on macOS, please make sure you’re running v4.30.0 or later or you might experience a “segmentation fault” error when running the calculator
If you need to install Docker, please click on “details” below to learn how.
We use docker to ship our software in containers to run on your computer. There are a lot of different tools for working with containers, but docker is the most popular.
The simplest way to install docker is by downloading Docker Desktop.
You might need to restart your computer after you’ve installed docker for the first time
You can check to see if docker desktop is running by opening a terminal and running:
$ docker run hello-worldYou should see something that looks like:
Hello from Docker!
This message shows that your installation appears to be working correctly.
To generate this message, Docker took the following steps:
1. The Docker client contacted the Docker daemon.
2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
(arm64v8)
3. The Docker daemon created a new container from that image which runs the
executable that produces the output you are currently reading.
4. The Docker daemon streamed that output to the Docker client, which sent it
to your terminal.
To try something more ambitious, you can run an Ubuntu container with:
$ docker run -it ubuntu bash
Share images, automate workflows, and more with a free Docker ID:
https://hub.docker.com/
For more examples and ideas, visit:
https://docs.docker.com/get-started/Install Nextflow
If you already have Nextflow installed on your computer, you can skip this section.
If you already have Nextflow installed, try running nextflow self-update to grab the latest version
If you need to install Nextflow, please click on “details” below to learn how.
pgsc_calc is built using Nextflow, so you’ll need Nextflow installed on your computer to run it. These installation steps are taken from the Nextflow documentation.
First, check if you have at least Java 11 installed on your computer:
$ java -version # java 21, looks good!
openjdk version "21.0.2" 2024-01-16If you don’t have Java installed, check the Nextflow documentation for next steps.
Then, run:
$ curl -s https://get.nextflow.io | bashThis will create a file called nextflow in the current directory.
Make the file executable:
$ chmod +x nextflowThen make your computer able to find the nextflow program:
$ sudo mv nextflow /usr/local/binThen check if nextflow is installed correctly:
$ nextflow infoYou should see something like:
Version: 23.10.1 build 5891
Created: 12-01-2024 22:01 UTC (22:01 BST)
System: Mac OS X 14.4.1
Runtime: Groovy 3.0.19 on OpenJDK 64-Bit Server VM 21.0.2
Encoding: UTF-8 (UTF-8)If you’d like to learn more about Nextflow, the nf-core community has a great explanation
Checklist
After you’ve completed this checklist, you’ll be ready to install pgsc_calc and calculate some polygenic scores 🧬
- The Docker Desktop application is open and running
- Running
docker run hello-worldin a terminal shows:
Hello from Docker!
This message shows that your installation appears to be working correctly.
...- Running
nextflow run helloin a terminal shows:
N E X T F L O W ~ version 23.10.1
Launching `https://github.com/nextflow-io/hello` [sleepy_wilson] DSL2 - revision: 7588c46ffe [master]
...
Hola world!
Ciao world!
Bonjour world!
Hello world!Calculate some polygenic scores
To install pgsc_calc and calculate some PGS, you can run the calculator using the test profile:
$ nextflow run pgscatalog/pgsc_calc -profile test,docker --outdir resultsWhen you try this for the first time it can take a few minutes to finish, depending on the speed of your internet connection. The calculator code and containers are being downloaded and cached in the background.
If your computer uses the ARM architecture (like modern M1 Macs) you need to change: -profile test,docker to -profile test,docker,arm in the example above
If you’ve used the calculator before, please include the parameter -latest to grab the most recent release
You should see output that looks like:
N E X T F L O W ~ version 23.10.1
Pulling pgscatalog/pgsc_calc ...
Already-up-to-date
Launching `https://github.com/pgscatalog/pgsc_calc` [chaotic_yalow] DSL2 - revision: 8bdf287d55 [main]
------------------------------------------------------
pgscatalog/pgsc_calc v2.0.0-alpha.5-g8bdf287
------------------------------------------------------
Core Nextflow options
revision : main
runName : chaotic_yalow
containerEngine : docker
launchDir : /Users/bwingfield/Documents/projects/pgsc_calc
workDir : /Users/bwingfield/Documents/projects/pgsc_calc/work
projectDir : /Users/bwingfield/.nextflow/assets/pgscatalog/pgsc_calc
userName : bwingfield
profile : test,docker
configFiles :
Input/output options
input : /Users/bwingfield/.nextflow/assets/pgscatalog/pgsc_calc/assets/examples/samplesheet.csv
scorefile : /Users/bwingfield/.nextflow/assets/pgscatalog/pgsc_calc/assets/examples/scorefiles/PGS001229_22.txt
outdir : /Users/bwingfield/.nextflow/assets/pgscatalog/pgsc_calc/results
Reference options
ref_samplesheet : /Users/bwingfield/.nextflow/assets/pgscatalog/pgsc_calc/assets/ancestry/reference.csv
ld_grch37 : /Users/bwingfield/.nextflow/assets/pgscatalog/pgsc_calc/assets/ancestry/high-LD-regions-hg19-GRCh37.txt
ld_grch38 : /Users/bwingfield/.nextflow/assets/pgscatalog/pgsc_calc/assets/ancestry/high-LD-regions-hg38-GRCh38.txt
ancestry_checksums : /Users/bwingfield/.nextflow/assets/pgscatalog/pgsc_calc/assets/ancestry/checksums.txt
Compatibility options
target_build : GRCh37
Max job request options
max_cpus : 2
max_memory : 6.GB
max_time : 6.h
Other parameters
config_profile_name : Test profile
config_profile_description: Minimal test dataset to check pipeline function
!! Only displaying parameters that differ from the pipeline defaults !!
------------------------------------------------------
If you use pgscatalog/pgsc_calc for your analysis please cite:
* The Polygenic Score Catalog
https://doi.org/10.1038/s41588-021-00783-5
* The nf-core framework
https://doi.org/10.1038/s41587-020-0439-x
* Software dependencies
https://github.com/pgscatalog/pgsc_calc/blob/master/CITATIONS.md
[b5/2b0584] Submitted process > PGSCATALOG_PGSCCALC:PGSCCALC:MAKE_COMPATIBLE:PLINK2_RELABELPVAR (cineca chromosome 22)
[54/5b2ef5] Submitted process > PGSCATALOG_PGSCCALC:PGSCCALC:INPUT_CHECK:COMBINE_SCOREFILES (1)
[22/8efca4] Submitted process > PGSCATALOG_PGSCCALC:PGSCCALC:MATCH:MATCH_VARIANTS (cineca chromosome 22)
[ee/f16912] Submitted process > PGSCATALOG_PGSCCALC:PGSCCALC:MATCH:MATCH_COMBINE (cineca)
[b9/aa6036] Submitted process > PGSCATALOG_PGSCCALC:PGSCCALC:APPLY_SCORE:PLINK2_SCORE (cineca chromosome 22 effect type additive 0)
[4d/c20fb0] Submitted process > PGSCATALOG_PGSCCALC:PGSCCALC:APPLY_SCORE:SCORE_AGGREGATE (cineca)
[f0/e5f070] Submitted process > PGSCATALOG_PGSCCALC:PGSCCALC:REPORT:SCORE_REPORT (cineca)
[59/71755b] Submitted process > PGSCATALOG_PGSCCALC:PGSCCALC:DUMPSOFTWAREVERSIONS (1)
-[pgscatalog/pgsc_calc] Pipeline completed successfully-If you can see the message Pipeline completed successfully, that means you’ve:
- Installed the PGS Catalog Calculator
- Calculated some polygenic scores
Well done! 🎉
If things don’t look quite right, please open a discussion or issue on GitHub describing your problem
Results
You can check the results folder in the same directory you ran the calculator from to look at the calculator outputs.
However, the test results are not biologically meaningful. They’re calculated from small synthetic data. The purpose of the test profile is to install the calculator and test all of the components are working correctly on your computer, ready for imputed human genomes.
Next steps
If you’d like apply the calculator to your genomic data, please check our documentation.
The calculator currently supports imputed human genomes.
Using unimputed array or whole-genome sequencing data with the calculator will often result in errors. These errors aren’t a bug with the calculator, they’re caused by the format and structure of the input genomes.
There are some things you need to do to prepare your genomes for PGS calculation
The documentation also contains things like:
- How-to guides
- Explanations about adjusting PGS and estimating genetic ancestry
- Explanations about the calculator output
- Common problems people experience
This tutorial describes using the calculator with docker because it’s the simplest approach on an ordinary computer
However, some computers can’t use docker - like HPC clusters or Trusted Research Environments (TRE) - so the calculator also supports Singularity/Apptainer and Anaconda